The Performance of Parity Placements in Disk Arrays

نویسندگان

  • Edward K. Lee
  • Randy H. Katz
چکیده

Due to recent advances in CPU and memory system performance, I/O systems are increasingly limiting the performance of modern computer systems. Redundant Arrays of Inexpensive Disks (RAID) have been proposed by Patterson et al. [1] to meet the impending I/O crisis. RAIDs substitute many small inexpensive disks for a few large expensive disks to provide higher performance (both transfer rate and I/O rate), smaller footprints and lower power consumption at a lower cost than the large expensive disks they replace. Unfortunately, with so many small disks, media availability becomes a serious problem. RAIDs provide high availability by using parity encoding of data to survive disk failures. As will be shown, the way parity is distributed in a RAID has signi cant consequences for performance. In particular, we investigate the performance of eight di erent parity placements via simulation and show that these eight parity placements span a limited space of parity placements. We show that although for small request sizes the choice of parity placement does not have a signi cant e ect on performance, for relatively large request sizes of hundreds of kilobytes at low loads, the choice of parity placement signi cantly affects performance (up to 20-30 percent for the typical disk array con gurations that are common today). We also propose properties that are generally desirable of parity placements. 1 Motivation In recent years, improvements in CPU and memory system performance has greatly outpaced improvements in I/O performance. If the trend continues, future improvements in CPU and memory system performance will be wasted as computer systems become increasingly I/O bound. To overcome the impending I/O crisis, Patterson et al. have proposed Redundant Arrays of Inexpensive Disks (RAID) [1{3]. RAIDs substitute many small inexpensive disks for a few large expensive disks to provide higher 1 performance (both transfer rate and I/O rate), smaller footprints and lower power consumption at a lower cost than the large expensive disks they replace. Unfortunately, with so many small disks, media availability becomes a serious problem. RAIDs provide high availability by using parity encoding of data to survive disk failures. Patterson [1] and Chen [4] de ne six di erent RAID organizations: RAID level 0 Non-redundant disk array. Only data striping is supported. RAID level 1 Mirrored disk array. Data is duplicated for reliability. RAID level 2 Hamming-coded disk array. RAID level 3 Parity-protected disk array with byte-interleaved data. Reads access all disks except the parity disk and writes access all disks. Only one I/O request may be serviced per parity disk at a time. RAID level 4 Parity-protected disk array with block-interleaved data. Small reads access a single data disk and small writes access a data disk and a parity disk. Several reads and a single write per parity disk may be serviced concurrently. RAID level 5 Parity-protected disk array with block-interleaved data and distributed parity. Similar to RAID level 4 except that the parity is distributed across all disks. Several reads and writes per parity disk may be serviced concurrently. This paper will investigate the performance implications of di erent ways of distributing parity (parity placements) in RAID level 5 disk arrays. Previous work in the eld has been limited, dealing primarily with parity placements for speci c applications or speci c optimizations for currently known parity placements. Dibble [5] has investigated parity placements under the restriction that all writes are small. A large write is, therefore, broken into many independent small writes. We do not assume such a restriction. Gray [6] has proposed a parity placement scheme in which parity is striped but the data is not. Menon [7] has proposed that the exact placement of the parity and data be varied dynamically in order to reduce the rotational latency. This paper di ers from those mentioned above by looking at parity placements within a more general framework. Due to the relatively recent surge of interest in the area, most of the papers written about RAID level 5 disk arrays have either been unaware of the performance consequences of parity placements 2 or have assumed that all parity placements result in equivalent performance. The result has been a dissemination of ad-hoc parity placements in the literature, encouraging the adoption of arbitrary parity placements by commercial vendors without performance justi cations. This is regrettable given the small di erence in complexity of implementing one parity placement over another. Thus, if one initially knew the performance trade-o s involved in selecting parity placements, immediate performance bene ts could be achieved at negligible cost. Speci cally, we will investigate a limited space of parity placements and show that at relatively large request sizes that are typical of scienti c applications [8], the choice of parity placement results in signi cant di erences in both read and write I/O performance (20-30 percent for the typical disk array con gurations that are common today). Based on our studies, we will also propose properties that are desirable of parity placements in general. Finally, we will conclude by making general recommendations for the selection of a parity placement for a given system. This paper is an extension of work previously reported [9]. Section 2 de nes terms related to RAID that will be used throughout the paper and serves as a brief introduction to RAID. Section 3 de nes eight parity placements and points out, for the rst time, an important distinction between physical and logical parity placements. Section 4 compares the simulated performance of the previously de ned parity placements for homogeneous workloads consisting either of reads or writes. For the two most promising parity placements, workloads consisting of a mixture of reads and writes are examined for the rst time. Based on the simulation results, we then propose properties that are desirable of good parity placements. Section 5 argues that the eight parity placements examined are the only interesting placements from a space of parity placements. Finally, we conclude by summarizing our main results and making general recommendations in the selection of parity placements. This work was heavily in uenced by experiences gained in designing, developing, using and analyzing RAID systems. RAID-I, our rst RAID prototype was built using o -the-shelf hardware and has been in experimental use for the past year. The hardware components of RAID-II, a much more ambitious second prototype using custom hardware, is currently operational and is undergoing software integration. RAID-II should be fully operational in the next six months. 2 Overview This section introduces concepts and de nes terms that will be used throughout the paper.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Study of RAID-5 Disk Arrays with Data and Parity Cache

Disk array architectures such as RAID-5 have become an acceptable way for designing highly reliable and high-performance storage systems. However, one major drawback of a RAID-5 disk array system is that an update to a data block may involve four disk accesses. Such a high overhead is especially undesirable for workloads with high update rate as in transaction processing. In this paper, we pres...

متن کامل

Using MEMS-Based Storage in Disk Arrays

Current disk arrays, the basic building blocks of highperformance storage systems, are built around two memory technologies: magnetic disk drives, and non-volatile DRAM caches. Disk latencies are higher by six orders of magnitude than non-volatile DRAM access times, but cache costs over 1000 times more per byte. A new storage technology based on microelectromechanical systems (MEMS) will soon o...

متن کامل

Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!)

Row-Diagonal Parity (RDP) is a new algorithm for protecting against double disk failures. It stores all data unencoded, and uses only exclusive-or operations to compute parity. RDP is provably optimal in computational complexity, both during construction and reconstruction. Like other algorithms, it is optimal in the amount of redundant information stored and accessed. RDP works within a single...

متن کامل

AFRAID - A Frequently Redundant Array of Independent Disks

Disk arrays are commonly designed to ensure that stored data will always be able to withstand a disk failure, but meeting this goal comes at a significant cost in performance. We show that this is unnecessary. By trading away a fraction of the enormous reliability provided by disk arrays, it is possible to achieve performance that is almost as good as a non-parityprotected set of disks. In part...

متن کامل

Dynamic Multiple Parity (DMP) Disk Array for Serial Transaction Processing

ÐThe performance of today's database systems is usually limited by the speed of their I/O devices. Fast I/O systems can be built from an array of low cost disks working in parallel. This kind of disk architecture is called RAID (Redundant Arrays of Inexpensive Disks). RAID promises improvement over SLED (Single Large Expensive Disks) in performance, reliability, power consumption, and scalabili...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Computers

دوره 42  شماره 

صفحات  -

تاریخ انتشار 1993